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inflated  estimates  of  agreement  appeared  likely. 
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Aggregation  Bias  in  Estimates  of  Perceptual  Agreement 

The  extent  to  which  individuals  agree  with  respect  to 
perceptions  of  various  aspects  of  their  work  environments  has 
been  addressed  in  a  number  of  climate  and  climate-related 
studies  (cf.  Bass,  Valenzi,  Farrow,  §  Solomon,  1975;  Drexler, 

1977;  Gavin  §  Howe,  1975;  Howe,  1977;  James,  Demaree,  5  Hater, 
1980;  Jones  5  James,  1979;  Litwin  §  Stringer,  1968;  Payne  $ 
Mansfield,  1973;  Payne  §  Pheysey,  1971;  Pritchard  §  Karasick, 
1973;  Schneider,  1972;  Scnneider  6  Bartlett,  1970;  Schneider  6 
Snyder,  1975;  Campbell  §  Beaty,  Note  1;  Curtis,  Note  2;  Hater, 

Note  3) .  Reviews  of  these  studies  indicate  that  the  range  of 
estimates  of  perceptual  agreement  among  individuals  is  .00  to 
.50,  with  a  median  of  approximately  .12  (James,  Hater,  Gent, 

6  Bruni,  1978;  James  §  Sells,  in  press;  Jones  §  James,  1979; 

Hater,  Note  3).  These  reviews  were  based  on  estimates  of 
interrater  reliability  for  a  single  rater  (intraclass  correlations 
and  proportions  of  variance  in  individuals'  perceptions  associ¬ 
ated  with  variation  among  environments  (eta-squares  and  omega- 
squares)  .  Not  included  were  estimates  of  reliabilities  of 
mean  perceptions  per  environment  (e.g.,  Spearman- Brown  corrected 
intraclass  correlations- -see  Jones  and  James,  1979  for  a 
discussion  of  this  issue)  and  estimates  subject  to  aggregation 
bias  (see  below) . 

An  estimate  of  agreement  at  the  higher  end  of  the  range  of 
agreement  values  is  an  eta-square  (n^)  of  .42  reported  by 
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Drexler  (1977).  However,  given  the  number  of  studies  reporting 
much  lower  values,  one  would  suspect  that  current  reviews  of 
climate  would  question  the  likelihood  of  perceptual  agreement. 
Such  is  often  not  the  case.  Woodman  and  King  (1978)  considered 
as  unresolved  the  question  of  what  attributes  (organizational 
versus  individual)  are  measued  by  climate  perceptions.  The  key 
study  referenced  to  support  the  organizational  attribute  position 
was  Drexler  (1977).  Landy  and  Trumbo  (1980)  went  a  step  further 
and  suggested  that  climate  perceptions  reflected  sufficient 
agreement  and  consistency  at  the  individual  level  to  justify 
their  use  as  descriptors  of  organizational  climate.  Drexler 
(1977)  was  the  key  supporting  reference  for  perceptual  consistency. 
Another  case  of  selective  attention,  again  based  on  Drexler 
(1977),  was  provided  by  Schneider,  Parkington,  and  Buxton  (1980, 
p.  254),  who  stated,  "The  assumption  of  agreement  in  perceptions 
has  been  demonstrated  empirically  and  allows  for  the  aggregation 
of  data  within  settings,  facilitating  studies  across  settings 
(Drexler,  1977) . " 

Perhaps  the  fascination  with  the  Drexler  article  stems  from 
the  fact  that  it  was  based  on  a  large  sample  of  individuals  and 
groups  (6,996  individuals,  1,256  workgroups)  from  21  diverse 
organizations.  It  is  unfortuante,  therefore,  that  the  reported 
q  of  .42  was  subject  to  an  aggregation  bias,  which  suggests 
that  conclusicns  drawn  by  Drexler  and  others  regarding  perceptual 
agreement  and  consistency  are  based  on  an  inflated  estimate  of 
variance  in  individuals'  perceptions  accounted  for  by  organiza- 


Agreement 

4 


tions.  The  initial  objective  of  this  article  is  to  demonstrate 
how  an  aggregation  bias  led  to  an  inflated  rj_  in  the  Drexler 
analysis.  Moreover,  inasmuch  as  aggregates  are  all  too  often 
used  to  estimate  perceptual  agreement,  the  inflation  of  agreement 
estimates  resulting  from  aggregation  bias  is  illustrated  in 
other  climate  and  nonclimate  studies. 

Aggregation  Bias  in  the  Drexler  Study 

Drexler  (1977)  concluded  not  only  that  "42.2%  of  the  variance 
in  climate  could  be  accounted  for  by  organization"  (p .  40),  but 
also  that  James  and  Jones'  (1974)  use  of  the  term  "psychological 
climate"  is  "misleading  if  it  connotes  a  construct  that  is 
largely  intraindividual"  (p.  41,  italics  added).  These  conclu¬ 
sions  would  lead  one  to  believe  that  42%  of  the  variance  in 
individuals '  climate  perceptions  had  been  accounted  for  by 
differences  in  the  21  organizations.  However,  Drexler  did  not 
employ  individuals'  climate  perceptions  as  the  dependent  variable. 
The  dependent  variabe  was  mean  perceptual  scores  per  workgroup, 
which  is  to  say  that  the  n  of  .42  was  based  on  an  experimental 
design  that  employed  K  =  21  organizations  as  the  independent 
variable  and  1,256  group  means,  nested  within  21  organizations, 
as  scores  on  the  dependent  variable.  Consequently,  interpreta- 
tion  of  the  r\_  of  .42  as  if  it  had  been  calculated  on  individuals' 
perceptions  almost  assuredly  provided  an  inflated  estimate  of 
agreement  at  the  individual  level.  As  discussed  below,  this  is 
a  form  of  aggregation  bias  known  as  the  "ecological  fallacy" 
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(cf.  Hannan,  1971,  1973;  Roberts,  Hulin,  §  Rousseau,  1978; 
Robinson,  1950) . 

An  ecological  fallacy  occurs  when  relationships,  or  functions 

2 

of  relationship  indicators  such  as  n  >  among  individual  level 
data  are  inferred  from  relationships  among  calculated  aggregates 
of  individual  level  data.  Typically,  relationships  among 
aggregates  provide  inflated,  technically  spurious  (cf.  Hannan, 
1971),  estimates  of  relationships  among  individual  level  data. 

This  is  easily  shown  statistically  for  the  Drexler  data.  Consider 
first  the  following  three  variance  terms:  (a)  Oj  --  the 
variation  of  the  6,996  individual  perceptions  about  the  grand- 
mean  of  all  individuals'  perceptions  (G)  ,  (b)  --  the  varia¬ 

tion  of  the  1,256  mean  workgroup  scores  about  G,  and  (c)  aQ2  -- 

the  variation  of  the  mean  organizational  scores  about  G.*  We 

?  2  2  2 
may  now  derive  three  rrs,  namely:  (a)  n,  =  a  / a  --  the 

—  '1  _o  wm 

proportion  of  variance  in  mean  workgroup  scores  accounted  for 

2  2  2 

by  differences  in  organizations;  (b)  r\2  =  oQ  /oj  --  the 

proportion  of  variance  in  individuals'  perceptions  accounted  for 
by  differences  in  organizations;  and  (c)  n3  =  o^  /oj  -  -  the 
proportion  of  variance  in  individuals'  perceptions  accounted  for 
by  differences  in  workgroups. 

2 

An  estimator  of  agreement  among  individuals  is  • 

“  2 

Drexler 's  estimate  of  .42  w.-.s,  however,  predicated  on  ri^  •  The 
potential  for  an  ecological  fallacy  is  made  evident  by  algebraic 

derivation,  which  shows  that  n 2  =  ^1^3  *  This  equation  suggests 

2 

that  the  appropriate  estimate  of  perceptual  agreement  Cn2  j  will 
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be  equal  to  the  Drexler  estimate  (p  )  only  in  the  condition 

that  100%  of  the  variance  in  individuals'  perceptions  is 

accounted  for  by  differences  in  workgroups  (i.e.,  p  =  1.0). 

If  p ^  is  less  than  1.0,  then  it  follows  that  p^  is  less  than 

p ^  ,  and  the  Drexler  approach  provides  an  inflated  estimate  of 

agreement  at  the  individual  level. 

The  degree  of  bias  in  the  Drexler  estimate  can  only  be 

2 

ascertained  by  a  reanalysis  of  the  data  (i.e.,  compute  p2  instead 

2  — 
of  p^  ).  It  is  assumed  that  the  bias  would  be  sizeable.  This 

assumption  is  based  on  a  study  by  Bass  et.  al.  (1975), 

2 

which  reported  the  highest  levels  of  p^  observed 
recently.  The  largest  reported  value  of  p^2  for  an  organizational 
level  variable  was  .504  (external  environment  in  the  library 
directory  sample- -see  Table  3).  The  median  value  of  p3  for 
organizational  level  variables  was  .325.  One  might  now  extrapo¬ 
late,  and  as  a  heuristic  exercise  insert  these  values  as  estimates 
in  the  equation  p2  =  pj^  P3  •  With  p^4  =  .422  (the  Drexler 
value),  we  find  p22  equal  to  .213  and  .137  for  the  highest  value 
(.504)  and  the  median  value  (.325)  of  p ^  ,  respectively. 
Interestingly,  these  estimates,  particularly  the  latter,  are  in 
line  with  the  median  value  of  perceptual  agreement  found  in  the 
reviews  by  Hater  (Note  3),  James  et  al.  (1978),  James  and  Sells 
(in  press)  ,  and  Jones  and  James  (1979) . 

2 

The  statistics  above  were  based  on  p  because  Drexler 

2 

employed  p  .  A  reviewer  suggested  another  approach  for 
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demonstrating  aggregation  bias.  This  approach  consists  of  view- 
2 

ing  (=.42)  as  a  reliability  of  mean  scores  per  workgroup 

resulting  from  an  upward  adjustment  in  an  intraclass  correlation 
for  the  average  number  of  raters  per  workgroup.  The  logic  here 
is  that  the  intraclass  correlation  (ICC)  is  an  estimate  of 
interrater  reliability  at  the  level  of  the  individual  rater  (cf. 
Shrout  §  Fleiss,  1979),  and  can  be  estimated  by  the  Spearman- 
Brown  (SB)  prophecy  equation  given  knowledge  of  the  reliability 
of  group  means  and  the  average  number  of  individuals  per  work¬ 
group  (5.57  =  6,996  individuals/1,256  workgroups).  The  equation 
is  .42  =  5.57  ICC/ (1  +  4.47  ICC)  ;  the  resulting  estimate  of  ICC 

is  .12.  This  estimate  is  about  the  same  as  the  .137  suggested 

2  2 
above  for  n2  based  on  the  median  value  of  n3  . 

In  conclusion,  it  appears  that  the  Drexler  (1977)  results 
were  subject  to  an  aggregation  bias  that,  when  corrected,  is 
consistent  with  other  studies  of  perceptual  agreement.  Hopefully, 
this  will  stimulate  some  current  reviewers  of  organizational 
climate  to  reconsider  their  conclusions  regarding  perceptual 
agreement,  perhaps  by  broadening  the  scope  of  their  reviews  to 
include  not  only  other  climate  studies  but  also  the  reasons  that 
individuals  in  the  same  organization  might  cognitively  construct 
somewhat  different  perceptions  (cf.  Ekehammar,  1974;  James  § 

Jones,  1976;  James  et  al . ,  1978;  Payne  §  Mansfield,  1973). 

Aggregation  Eias  in  Other  Studies  of  Perceptual  Agreement 


Estimates  of  agreement  based  on  group  mean  scores  have  been 
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incorrectly  interpreted  as  applying  to  individuals'  perceptions 
in  a  number  of  other  studies.  Specific  reference  is  directed 
to  studies  of  perceptual  agreement  among  members  of  different 
roles  (e.g.,  supervisors  -  subordinates ,  incumbents  -  observers , 
customers -employees)  in  which  either  (a)  the  perception  of  a 
member  of  one  role  (e.g.,  a  supervisor)  is  correlated  with  the 
aggregate  perception  of  two  or  more  members  of  another  role  (e.g., 
subordinates)  on  a  sample  of  K  role  sets,  or  (b)  two  sets  of 
aggregate  perceptions  are  correlated  (e.g.,  mean  customer  scores 
and  mean  employee  scores  on  a  sample  of  K  organizations).  The 
former  procedure  is  referred  to  as  the  "single  aggregate  approach", 
and  the  latter  the  "double  aggregate  approach."  Each  approach 
is  discussed  briefly  below. 

Single  aggregate  approach.  This  approach  is  illustrated 
by  the  following  studies:  (a)  Evans  (1972),  who  measured  agree¬ 
ment  among  perceptions  of  leader  behavior  by  correlating  super¬ 
visors'  self -descriptions  with  means  of  subordinates'  descriptions 
(using  groups  as  observations);  (b)  Oldham  (1976),  where  agreement 
was  assessed  by  correlating  focal  managers'  perceptions  of  their 
own  motivational  strategies  with  means  of  subordinates'  percep¬ 
tions;  and  (c)  Schneider  (1972)  and  Schneider  and  Bartlett  (1970), 
who  estimated  "agreement  on  climate  perceptions  across  positions” 
by  correlating  the  mean  climate  perceptions  of  agents  with  the 
climate  perceptions  of  agency  managers  on  a  sample  of  life 
insurance  agencies.  While  none  of  these  studies  reported 
particularly  high  levels  of  agreement,  it  is  nevertheless  likely 
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that  the  estimates  were  inflated.  This  is  because  the  use  of 
aggregates  (means)  of  subordinates'  (agents')  perceptions  deleted 
from  the  analysis  within-group  (within- agency)  variance  in 
subordinates'  (agents')  perceptions.  That  is,  of  course,  error 
variance . 


The  likelihood  of  inflated  estimates  of  agreement  is  demon¬ 
strated  statistically  by  employing  an  analogue  of  an  equation 
presented  by  James  et  al.  (1980,  Eq.  5)  to  assess  relationships 
between  situational  variables  and  individual  variables.  For 
example,  if  the  it^1  manager's  climate  score  (Y^)  is  assigned  to 
all  agents  in  the  agency  (k=l,...,  K  agencies),  then  the 
correlation  between  managers'  climate  scores  and  agents'  climate 

scores  (X.,  --  j=l,...,  n.  agents  per  agency),  based  on  all 

3  K  —  3 

agents  across  the  sample  of  K  agencies,  takes  the  form:  r  = 

2  - 

n. r  —  The  correlation  r  v  is  an  estimate  of  agreement  between 
'k  yx  yx  b 

managers  and  agents  at  the  individual  level  of  analysis  (i.e., 
no  scores  have  been  aggregated) ,  the  square  of  reflects  the 
proportion  of  variance  in  agents'  climate  scores  accounted  for 
by  differences  in  the  K  agencies,  and  r  —  is  the  correlation 

_  7 

between  managers'  climate  scores  and  mean  climate  scores  for 

agents,  based  on  the  total  agent  sample.  Given  equal  n^ ,  the 

latter  correlation,  r  _,  provides  the  same  value  as  the  stat- 

Yx. 

istic  used  by  Evans  (1972)  ,  Oldham  (1976)  ,  Schneider  (1970)  ,  and 
Schneider  and  Bartlett  (1970)  to  compute  estimates  of  perceptual 
agreement.  Note  that  r^  will  be  equal  to  TyX  only  in  the 
condition  that  =  1.0,  which  suggests  that  all  agents  in  each 
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agency  agreed  perfectly  (i.e.,  there  is  no  within  -  agency  variance 
in  agents'  climate  scores). 

It  is  extremely  unlikely  that  an  will  equal  1.0.  Conse¬ 
quently,  given  that  is  less  than  1.0,  it  follows  that  TyX  is 
less  than  r  — .  Thus,  r  —  provides  an  inflated  estimate  of 

y*  y* 

perceptual  agreement  at  the  individual  level  of  analysis. 

Double  aggregate  approach.  Examples  of  this  approach  are 
seen  in  the  following  studies:  (a)  Hackman  and  Lawler  (1971), 
Hackman  and  Oldham  (1975),  Hackman,  Pearce,  and  Wolfe  (1978), 
and  Oldham,  Hackman , and  Pearce  (1976),  who,  for  example,  assessed 
agreement  between  supervisors'  and  subordinates'  perceptions  of 
a  job  dimension  by  correlating  mean  supervisory  perceptions  with 

3 

mean  subordinate  perceptions,  using  jobs  as  the  sample;  (b) 

Ilgen  6  Fugii  (1976),  who,  after  showing  that  observers  and  group 
members  did  not  agree  on  descriptions  of  leader  baliavior  at  the 
individual  level  of  analysis,  proceeded  to  reestimate  agreement 
by  computing  correlations  between  mean  observer  perceptions  and 
mean  group  member  perceptions  on  a  sample  of  groups;  and  (c) 
Schneider  and  Snyder  (1975),  who  correlated  means  of  managers' 
climate  perceptions  and  means  of  trainees'  climate  perceptions 
on  a  sample  of  life  insurance  agencies  to  test  the  hypothesis 
that  "people  in  an  organization  should  agree  more  on  their 
description  of  the  climate  than  on  their  feelings  of  job  satis¬ 
faction"  (p .  319,  italics  added  to  emphasize  that  the  level  of 
interpretation  is  individuals) . 

Basing  estimates  of  agreement  on  double  aggregates  is  an 
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exacerbation  of  the  problem  with  single  aggregates.  That  is, 

rather  than  deleting  within-group  (within-job,  within-agency) 

error  variance  for  one  set  of  groups,  it  is  now  being  deleted 

2 

for  two  sets  of  groups.  Thus,  unless  the  r\  for  each  group  is 
equal  to  1.0  (i.e.,  there  is  no  within-group  variation  for  eithe 
group) ,  a  correlation  of  aggregates  will  provide  an  inflated 
estimate  of  agreement  among  individuals. 

The  substance,  although  not  the  precise  form,  of  the  statis 
tical  bias  resulting  from  using  double  aggregates  to  estimate 
perceptual  agreement  among  individuals  is  illustrated  in  the 
following  scenario.  The  basic  question  addressed  here  is  whethe 
individuals  who  rated  the  same  job  agreed.  Suppose  that  we  have 
a  sample  of  100  jobs  and  10  different  raters  for  each  job,  where 
the  raters  may  be  job  incumbents,  supervisors,  observers,  etc. 

An  estimate  of  perceptual  agreement  among  individuals  is  then 
calculated  using  the  ICC  procedure  to  provide  an  interrater 
reliability  at  the  level  of  the  individual  rater. *  Now  suppose 
that  we  (a)  randomly  split  the  10  raters  for  each  job  into  two 
groups  cf  five,  (b)  calculate  a  mean  for  each  group  of  five,  and 
(c)  correlate  the  means  using  the  100  jobs  as  the  sample.  If 
the  ICC  is  arbitrarily  set  at  .30,  then  the  correlation  among 
means  may  be  estimated  by  applying  the  SB  equation,  using  a 
correction  factor  of  five  (i.e.,  five  scores  per  mean).  The 
resulting  value  is  .68,  but  then  this  is  an  estimate  of  the 
reliability  of  means  and  a  highly  inflated  estimate  of  agreement 
at  the  individual  level. 
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Conclus ions 

The  moral  of  the  story  is  simple.  If  one  is  assessing 
perceptual  agreement  among  individuals,  then  the  appropriate 
level  of  analysis  is  the  individual.  Furthermore,  to  avoid 
misunderstanding,  the  discussion  here  was  limited  to  the  fallacy 
of  interpreting  agreement  estimates  based  on  aggregates  as 
applying  to  agreement  among  individuals.  The  use  of  aggregates 
for  other  purposes  was  not  addressed.  What  justifies  the  calcu¬ 
lation  of  an  aggregate,  making  sense  out  of  what  is  measured  by 
an  aggregate,  and  interpreting  relationships  among  aggregates 
are  subjects  that  require  considerable  discussion  and  are  prone 
to  debate.  Compare,  for  example,  Ilgin  and  Fugii's  (1976)  and 
Katona's  (1979)  justifications  for  aggregate  level  analysis  with 
issues  raised  by  Firebaugh  (1978,  1980)  concerning  what  is 
measured  by  an  aggregate  and  interpretation  of  relationships 
among  aggregates. 
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*To  simplify  statistical  developments,  pppulation  para¬ 
meters  were  employed  and  it  was  assumed  that  (a)  the  number  of 
individuals  in  each  workgroup  was  the  same  for  all  workgroups, 
and  (b)  the  number  of  workgroups  in  each  organization  was  equal 
for  all  organizations.  Thus,  for  example,  the  grand  mean  of 
individuals'  scores  was  the  same  as  the  grand  mean  of  mean  work¬ 
group  scores,  and  the  means  per  organization  of  individuals' 
scores  and  mean  workgroup  scores  were  equivalent.  While  unreal¬ 
istic  in  practice,  these  assumptions  do  not  affect  the  logic  and 
conclusions  of  the  statistical  critique. 

2 

See  James  et  al .  (1980)  for  assumptions  (e.g.,  linearity) 
to  interpret  this  statistic.  In  the  present  application,  it  was 
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not  assumed  that  the  was  homogeneous  with  respect  to  agents, 
although  the  n^  were  considered  equal. 

^Hackman  and  Oldham  (1975  ,  p.  164)  referred  to  correlations 
among  means  involving  employees  and  both  supervisors  and  research 
ers  as  "indirect  tests  of  the  'objectivity'  of  employee  ratings" 
rather  than  tests  of  agreement.  It  is  also  the  case  that  (a) 
these  correlations  were  compared  to  estimates  of  agreement  in 
the  Hackman  and  Lawler  (1971)  study,  (b)  Hackman  and  Lawler  (1971 
p.  268)  stated  that  it  is  not  "possible  to  demonstrate  conclusive 
ly  that  employee  judgments  are  objectively  accurate,  because  no 
unambiguous  standard  of  accuracy  is  available",  and  (c)  later 
studies  (Hackman  et  al . ,  1978;  Oldham  et  al . ,  (1976)  returned 
to  the  use  of  the  term  "agreement."  It  appeared  reasonable, 
therefore,  to  regard  the  Hackman  and  Oldham  (1975)  estimates  as 
agreement  indices. 

^The  statistical  model  employed  in  this  scenario  involves 
a  random  effects,  one-way  ANOVA  and  the  ICC  equation  for  incom¬ 
plete  designs  (cf.  Shrout  5  Fleiss,  1979).  A  separate  component 
for  type  of  rater  (e.g.,  supervisor,  subordinate)  could  be 
included  by  using  a  more  sophisticated  design,  such  as  a  general 
linear  model  with  dummy  variables  to  represent  jobs  and  types 
of  raters,  accompanied  by  appropriate  interaction  terms.  Never¬ 
theless,  the  basic  question  is  whether  individuals  rating  the 
same  job  agreed,  which  is  the  question  addressed  by  the  simple 


